InΒ [Β ]:
import librosa
import numpy as np
from IPython.display import Audio
import matplotlib.pyplot as plt
import holoviews as hv
hv.extension('bokeh')
file_names = ['audio/03-01-01-01-01-02-01.wav',
'audio/20 - 20,000 Hz Audio Sweep Range of Human Hearing.mp3',
'audio/videoplayback.mp3']
Time-Series DataΒΆ
The loaded data is represented as audio_data, which is depicted in terms of time and magnitude.
Does this mean that all the information about the sound we hear is encompassed in this representation?
Is what we hear merely the magnitude of these sounds?
Although this representation appears simple, it can be quite challenging and unintuitive to work with.
Here are some issues encountered:
- The interpretation, while, is straightforward, it is not intuitive. For instance, what does a pattern of high amplitude followed by a sudden drop signify? Or what does a gradual increase in amplitude imply?
- Even a slight time shift can lead to significantly different analyses.
- what we see here is amplitude. How do we percieve sound? It's not immediately clear how these amplitude variations translate into the sounds and tones we perceive and understand.
InΒ [Β ]:
for i in range(len(file_names)):
file_name = file_names[i]
audio_data, sample_rate = librosa.load(file_name)
time = librosa.times_like(audio_data, sr=sample_rate)
plot = hv.Curve((time, audio_data)).opts(width=1100, height=400, title="Waveform: " + file_name)
display(plot)
display(Audio(data=audio_data, rate=sample_rate, autoplay=False))